Efficient memory representation of XML document trees

نویسندگان

  • Giorgio Busatto
  • Markus Lohrey
  • Sebastian Maneth
چکیده

Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. In this paper, a technique is presented that allows to represent the tree structure of an XML document in an efficient way. The representation exploits the high regularity in XML documents by compressing their tree structure; the latter means to detect and remove repetitions of tree patterns. Formally, context-free tree grammars that generate only a single tree are used for tree compression. The functionality of basic tree operations, like traversal along edges, is preserved under this compressed representation. This allows to directly execute queries (and in particular, bulk operations) without prior decompression. The complexity of certain computational problems like validation against XML types or testing equality is investigated for compressed input trees.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inflatable XML Processing

The past few years have seen the widespread adoption of XML as a data representation format in various middleware: databases, Web Services, messaging systems, etc. One drawback of XML has been the high cost of XML processing. We present in this paper InflateX, a system that supports efficient XML processing. InflateX advances the state of the art in two ways. First, it uses a novel representati...

متن کامل

A Partial-tree-based Approach for XPath Query on Large XML Trees

XML is a popular data definition language and is widely used for representation of arbitrary data structures. For queries on XML documents, XPath has commonly been used in many applications. The complexity of applying queries increases as the number of nodes in an XML document increases. Querying very large XML documents becomes really difficult when there is not enough computer memory to store...

متن کامل

Efficient Memory Representation of XML Documents

Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. Here a technique is presented that allows to represent the tree structure of an XML docu...

متن کامل

A Simple Optimal Representation for Balanced Parentheses

We consider succinct, or highly space-efficient, representations of a (static) string consisting of n pairs of balanced parentheses, that support natural operations such as finding the matching parenthesis for a given parenthesis, or finding the pair of parentheses that most tightly enclose a given pair. This problem was considered by Jacobson, [Proc. 30th FOCS, 549–554, 1989] and Munro and Ram...

متن کامل

A Validating XML Documents in the Streaming Model with External Memory

We study the problem of validating XML documents of sizeN against general DTDs in the context of streaming algorithms. The starting point of this work is a well-known space lower bound. There are XML documents and DTDs for which p-pass streaming algorithms require Ω(N/p) space. We show that when allowing access to external memory, there is a deterministic streaming algorithm that solves this pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Syst.

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2008